-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recursiveloader: use less memory in assert_directory_verifies #2
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2 +/- ##
==========================================
- Coverage 96.2% 95.84% -0.36%
==========================================
Files 21 21
Lines 4928 5010 +82
==========================================
+ Hits 4741 4802 +61
- Misses 187 208 +21
Continue to review full report at Codecov.
|
Is the memory use a real problem here? Speed is a problem, and 20% slowdown is not really acceptable. |
Realistically, the memory consumption probably will not be an issue for the vast majority of systems where it makes sense to compile ebuilds from source. It's possible that loading all manifests into memory could be a scalability problem if the repository was large enough, but that doesn't seem likely to be a problem any time soon. Things like portage may choose to call gemato as a subprocess in order to avoid unnecessary memory allocation in the main process. The 20% slowdown can be blamed on the |
Well, I was kinda thinking about that and decided that if it ever becomes a problem, we should probably use weak references to let Python unload unused Manifests. However, I've never looked into how well that would work. As for the patch, it's really hard to read. If you still remember what the differences are, could you try updating it to avoid whitespace changes for now? I.e. if you dedented stuff, add |
In order to avoid loading all manifest entries into memory at once, make assert_directory_verifies use a shared sort_key to iterate over directories and manifest entries in the same order. For verification of the entire gentoo tree, my tests have shown that this change reduces the memory footprint by about 63%, while consuming about 20% more time.
3c3ac54
to
a0ce724
Compare
Modified indents to minimize diff. |
Ok, I think I see what you're trying to do. I still think weak references are a better option here, presuming Python will free them for other objects. |
It doesn't seem possible to implement get_file_entry_dict without having all the manifests loaded into memory at once, which is why I added the _iter_file_entry_dict method to use instead. |
In order to avoid loading all manifest entries into memory
at once, make assert_directory_verifies use a shared sort_key
to iterate over directories and manifest entries in the same
order.
For verification of the entire gentoo tree, my tests have
shown that this change reduces the memory footprint by about
63%, while consuming about 20% more time.
@mgorny, maybe we can come up with something to accomplish
the goal of this patch while being a little less tricky?